Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the mapping of sequence to log block index in transaction log iterator #12538

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HypenZou
Copy link
Contributor

@HypenZou HypenZou commented Apr 15, 2024

Context/Summary:

The official documentation recommends using the GetUpdatesSince method for master-slave synchronization. Every time this method called, it will search for the start sequence from the beginning of the target WAL log. It may costs too much to seek to start sequence when wal file size is large. If high requirements are placed on master-slave delay, SeekToStartSequence() may be called very frequently and cause a large amount of read IO (although in most cases read page cache).
There are some issues discussing about it:
#12516
#889

In most scenarios, the start sequences of the GetUpdatesSince method are continuous. Caching the previously seek position can avoid most repeated readings of the wal file.

This commit will cache the mapping of sequence to WAL file block index. It does this both after seeking to the start sequence of the iterator and after iterating to the end of the iterator. When seeking to start sequence, the log reader will lookup the first batch sequence that is not larger than the target sequence and then skip wal file read pointer to the start of the block, instead of seeking from the start.

The optimization effect on the getupdatesSince interface is quite obvious
image

Test:
covered by ./db_log_iter_test

@HypenZou HypenZou changed the title Cache sequence to log block index in transaction log iterator Cache the mapping of sequence to log block index in transaction log iterator Apr 15, 2024
@HypenZou HypenZou force-pushed the get-update-since branch 2 times, most recently from f26e11d to de39db5 Compare April 16, 2024 05:40
@HypenZou HypenZou force-pushed the get-update-since branch 2 times, most recently from ae84b5c to d1ad917 Compare May 7, 2024 16:39
@HypenZou
Copy link
Contributor Author

HypenZou commented May 7, 2024

@ajkr @jowlyzhang hi, could you please review my code when you have time? Any comments or suggestions are welcome ~

reusing the iterator may be a workaround, but it may be weird and undefined to use a iterator after it is invalid. Additionally, the comments for the Next() method state that it requires Valid() to be true.

@jsteemann
Copy link
Contributor

We have also observed that WAL tailing using consecutive calls to GetUpdatesSince() with increasing sequence number offsets result in "random" latency. The latency seems to depend on how many WAL records need to be skipped for the call, which is exactly what would be addressed by this PR. So this would be a very welcome improvement for us, and we would be very happy if this could be included in one of the next releases.

ReportCorruption(fragment.size(),
"missing start of fragmented record(2)");
if (!skipped_) {
ReportCorruption(fragment.size(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also clear skipped_ here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx for review

@@ -330,6 +330,26 @@ TEST_F(DBTestXactLogIterator, TransactionLogIteratorBlobs) {
"Delete(0, key2)",
handler.seen);
}

TEST_F(DBTestXactLogIterator, TransactionIteratorCache) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would pass before this change too. Maybe check that cache is effective by check some statistics on IO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update it, thx for review

db/log_reader.cc Outdated
@@ -130,13 +132,16 @@ bool Reader::ReadRecord(Slice* record, std::string* scratch,
prospective_record_offset = physical_record_offset;
scratch->assign(fragment.data(), fragment.size());
in_fragmented_record = true;
skipped_ = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to also clear after kUserDefinedTimestampSizeType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx for review

db/log_reader.h Outdated
@@ -170,6 +185,10 @@ class Reader {
// is only for WAL logs.
UnorderedMap<uint32_t, size_t> recorded_cf_to_ts_sz_;

// if log reader is skipped, may need to drop bytes
// until seek to the first of a record
bool skipped_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing the name to something like skip_first_fragmented_record_ since this field will be set to false after skipping a record.

Copy link
Contributor Author

@HypenZou HypenZou Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ✅ , thx for reveiw

// if true, the mapping of db sequence to WAL file block index will be
// cached. This prevents repeated reading from the beginning of the
// target wal log when GetUpdatesSince() is called.
// Default: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not enable this by default since this won't work with WAL compression, where the first record sets the compression type.

Copy link
Contributor Author

@HypenZou HypenZou Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done , thx for reveiw~
and now will return not support err when wal compression and with_cache are both set

if (read_options_.with_cache_) {
// cache the mapping of sequence to log block index when seeking to the
// start or end sequence
if ((current_batch_seq_ <= starting_sequence_number_ &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case to cache for the start sequence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are multiple slaves catching up, the start sequence may not necessarily be incremental; Also, there may be scenarios that do not iterate to the end sequence but calling GetUpdateSince frequently. Cache at the beginning and end sounds reasonable for me

uint64_t block_index;
SeqWithFileBlockIdx(uint64_t log_num, uint64_t seq_num, uint64_t block_idx)
: log_number(log_num), seq_number(seq_num), block_index(block_idx){};
bool operator<(const SeqWithFileBlockIdx& other) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would block_index be different with the same log_number and seq_number?

Copy link
Contributor Author

@HypenZou HypenZou Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be such a situation, but retaining the judgment for block_idx seems no problem
update: remove the judgement and add assertion to ensure it's same.

std::lock_guard<std::mutex> lk{mutex_};
if (cache_.size() > size_) {
auto iter = cache_.begin();
std::advance(iter, rand_.Next() % cache_.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about always erasing the oldest one? I assume GetUpdatesSince() is called with likely increasing sequence numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has changed to replace the oldest one. thx for the advice

@HypenZou HypenZou force-pushed the get-update-since branch 6 times, most recently from 0f276f3 to 89231e3 Compare June 18, 2024 15:32
@HypenZou HypenZou requested a review from cbi42 June 18, 2024 16:21
@HypenZou HypenZou force-pushed the get-update-since branch 2 times, most recently from a382537 to 0da6a66 Compare June 20, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants